Signal processing method for cochlear implant

ABSTRACT

A signal processing method for cochlear implant is performed by a speech processor and comprises a noise reduction stage and a signal compression stage. The noise reduction stage can efficiently reduce noise in a electrical speech signal of a normal speech. The signal compression stage can perform good signal compression to enhance signals to stimulate cochlear nerves of a patient with hearing loss. The patient who uses a cochlear implant performing the signal processing method of the present disclosure can accurately hear normal speech.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional application of U.S. patent application Ser. No. 14/838,298 filed on Aug. 27, 2015. The entire disclosure of the prior application is considered to be part of the disclosure of the accompanying application and is hereby incorporated by reference.

BACKGROUND

The present disclosure relates to a signal processing method, and more particularly to a signal processing method applied in cochlear implant.

Cochlear implant is a surgically implanted electronic device that provides a sense of sound to patients with hearing loss. Progress of the cochlear implant technologies has enabled many such patients to enjoy high quality level of speech understanding.

Noise reduction and signal compression are critical stages in the cochlear implant. For example, a conventional cochlear implant comprising multiple microphones can enhance the sensed speech volume. However, noise in the sensed speech is also amplified and compressed so as to affect the speech clarity. Besides, the multiple microphones increase hardware cost.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is illustrated by way of embodiments and accompanying drawings.

FIG. 1 is a circuit block diagram of a cochlear implant of a prior art.

FIG. 2 is a detailed circuit diagram showing a speech processor connected to a microphone and pulse generators of an exemplary embodiment of the present disclosure.

FIG. 3 is a schematic view of a single-layered DAE-based NR structure.

FIG. 4A shows an amplitude envelope of a clean speech signal; FIG. 4B shows an amplitude envelope of a noisy speech signal; FIG. 4C shows an amplitude envelope detected by a conventional log-MMSE estimator; FIG. 4D shows an amplitude envelope detected by a conventional KLT estimator; and FIG. 4E shows an amplitude envelope detected by the exemplary embodiment of the present disclosure.

FIG. 5 is a circuit block diagram of one channel of the speech processor of FIG. 2.

FIG. 6 is a waveform diagram of an amplitude envelope detected by an envelope detection unit of the speech processor of FIG. 2.

FIG. 7 is a waveform diagram of an output frame generated by the signal compressor of the speech processor of FIG. 2.

DETAILED DESCRIPTION

With reference to FIG. 1, a basic and conventional configuration of a circuit block diagram of a cochlear implant comprises a microphone 11, a speech processor 12, a transmitter 13, a receiver 14, a pulse generator 15, and an electrode array 16. The microphone 11 and the speech processor 12 are assembled to be mounted on a patient's ear. The transmitter 13 is adapted to be attached on skin of the patient's head. The receiver 14, the pulse generator 15, and the electrode array 16 are implanted under the skin on head of a patient.

The microphone 11 is an acoustic-to-electric transducer that converts a normal speech sound into an electrical speech signal. The speech processor 12 receives the electrical speech signal and converts the electrical speech signal into multiple output sub-speech signals in different frequencies. The transmitter 13 receives the output sub-speech signals from the speech processor 12 and wirelessly sends the output sub-speech signals to the receiver 14. The pulse generator 15 receives the output sub-speech signals from the receiver 14 and generates different electrical pulses based on the output sub-speech signals to the electrode array 16. The electrode array 16 includes a plurality of electrodes 161 and each of the electrodes 161 electrically connected to different cochlear nerves of the patient's inner ear. The electrodes 161 output the electrical pulses to stimulate the cochlear nerves, such that the patient can hear something approximating to normal speech.

The present disclosure provides a signal processing method for cochlear implant and the cochlear implant using the same. The signal processing method is performed by a speech processor of the cochlear implant. The signal processing method is configured to compress an input speech signal into a predetermined amplitude range, which includes a noise reduction stage and a signal compression stage.

In more detail, with reference to FIG. 2, the speech processor 12 has multiple channels including a first channel, a second channel, . . . , an i-th channel, . . . , and a n-th channel, wherein i and n are positive integers. Each one of the channels has a band-pass filter 121, an envelope detection unit 122, and a signal compressor 123. The envelope detection unit 122 is used to detect an amplitude envelope of a signal and can have a rectifier 124 and a low-pass filter 125. In the present disclosure, a noise reduction unit 126 is added. The noise reduction unit 126 is connected between the microphone 11 and the band-pass filters 121 of each one of the channels. In time domain, when the noise reduction unit 126 receives the electrical speech signal from the microphone 11, the noise reduction unit 126 segments the electrical speech signal into several continuous frames to reduce noise of the frames. For example, when a time length of the electrical speech signal is 3 seconds, the noise reduction unit 126 can segment the electrical speech signal into 300 continuous frames, wherein a time length of each one of the frames of the electrical speech signal is 10 milliseconds.

Based on the above configuration, the band-pass filter 121 of each one of the channels sequentially receives the frames of the electrical speech signal from the noise reduction unit 126. The band-pass filter 121 of each one of the channels can preserve elements of each one of the frames of the electrical speech signal within a specific frequency band and remove elements beyond the specific frequency band from such frame. The specific frequency bands of the band-pass filters 121 of the channels are different from each other. Afterwards, the amplitude envelopes of the frames of the electrical speech signal are detected by the envelope detection units 122 and are provided to the signal compressors 123.

The present disclosure relates to a noise reduction stage performed by the noise reduction unit 126 and a signal compression stage performed by the signal compressor 123. The noise reduction stage and the signal compression stage are described below.

1. Noise Reduction Stage

The noise reduction unit 126 can be performed in a deep denoising autoencoder (DDAE)-based noise reduction (NR) structure. The DDAE-based NR structure is widely used in building a deep neural architecture for robust feature extraction and classification. In brief, with reference to FIG. 3, a single-layered denoising autoencoder (DAE)-based NR structure comprises an input layer 21, a hidden layer 22, and an output layer 23. The DDAE-based NR structure is a multiple-layered DAE-based NR structure comprising the input layer 21, the output layer 23, and multiple hidden layers 22. Because the parameter estimation and speech enhancement procedure of DDAE is the same as for that of single-layered DAE, only the parameter estimation and speech enhancement for the single-layered DAE is presented, for ease of explanation. The same parameter estimation and speech enhancement procedures can be followed for the DDAE.

The input layer 21 receives an electrical speech signal y from the microphone 11 and segments the electrical speech signal y into a first noisy frame y₁, a second noisy frame y₂, . . . , a t-th noisy frame y_(t), . . . , and a T-th noisy frame y_(T), wherein T is a length of the current utterance. In other words, the present disclosure may segment an input speech signal, such as the electrical speech signal y, into a plurality of time-sequenced frames, such as the noisy frames y₁, y₂, . . . , and y_(T). For the elements in the t-th noisy frame y_(t), the noise reduction unit 126 reduces noise in the t-th noisy frame y_(t) to form a t-th clean frame x_(t). Afterwards, the output layer 23 sends the t-th clean frame x_(t) to the channels of the speech processor 12.

A relationship between the t-th noisy frame y_(t) and the t-th clean frame x_(t) can be represented as:

x _(t) =W ₂ h(y _(t))+b ₂   (equation (1))

wherein h(y_(t)) is a function including W₁ and b₁ in time domain and W₁ and W₂ are default connection weights in time domain. b₁ and b₂ are default vectors of biases of the hidden layers 22 of the DDAE-based NR structure in time domain.

In another embodiment, the relationship between the t-th noisy frame y_(t) and the t-th clean frame x_(t) can be represented as:

x _(t)=InvF{(W ₂ ′h′(F{y _(t)})+b ₂′)}  (equation (2))

wherein F{} is a Fourier transform function to transfer the t-th noisy frame y_(t) from time domain to frequency domain and h′( ) is a function including W₁′ and b₁′; W₁′ and W₂′ are default connection weights in frequency domain. b₁′ and b₂′ are default vectors of biases of the hidden layers 22 of the DDAE-based NR structure in frequency domain and InvF { } is an inverse Fourier transform function to obtain the t-th clean frame x_(t).

According to experiment results, the t-th clean frame x_(t) deduced from the Fourier transform and the inverse-Fourier transform as mentioned above has better performance than that without the Fourier transform and the inverse-Fourier transform.

For the time domain based method as shown in equation (1), h(y_(t)) can be represented as:

$\begin{matrix} {{h\left( y_{t} \right)} = {{\sigma \left( {{W_{1}y_{t}} + b_{1}} \right)} = \frac{1}{1 + {\exp \left\lbrack {- \left( {{W_{1}y_{t}} + b_{1}} \right)} \right\rbrack}}}} & \left( {{equation}\mspace{14mu} (3)} \right) \end{matrix}$

For the frequency domain based method shown in equation (2), h′(F{y_(t)) can be represented as:

$\begin{matrix} {{h^{\prime}\left( {F\left\{ y_{t} \right\}} \right)} = {{\sigma \left( {{W_{1}^{\prime}F\left\{ y_{t} \right\}} + b_{1}^{\prime}} \right)} = \frac{1}{1 + {\exp \left\lbrack {- \left( {{W_{1}^{\prime}F\left\{ y_{t} \right\}} + b_{1}^{\prime}} \right)} \right\rbrack}}}} & \left( {{equation}\mspace{14mu} (4)} \right) \end{matrix}$

Regarding the parameters including W₁, W₂, b₁ and b₂ in time domain or W₂′, b′ and b₂′ in frequency domain, they are preset in the speech processor 12.

For example, in time domain, the parameters including W₁, W₂, b₁ and b₂ in equations (1) and (3) are obtained from a training stage. Training data includes a clean speech sample u and a corresponding noisy speech sample v. Likewise, the clean speech sample u is segmented into several clean frames u₁, u₂, . . . , u_(T′), and the noisy speech sample v is segmented into several noisy frames v₁, v₂, . . . , v_(T′), wherein T′ is a length of a training utterance.

The parameters including W₁, W₂, b₁ and b₂ of equation (1) and equation (3) are optimized based on the following objective function:

$\begin{matrix} {\theta^{*} = {\arg \; {\min_{\theta}\left( {{\frac{1}{T^{\prime}}{\sum\limits_{t = 1}^{T^{\prime}}{{u_{t} - {\overset{\_}{u}}_{t}}}_{2}^{2}}} + {\eta \left( {{W_{1}}_{2}^{2} + {W_{2}}_{2}^{2}} \right)}} \right)}}} & \left( {{equation}\mspace{14mu} (5)} \right) \end{matrix}$

In equation (5), θ is a parameter set {W₁, W₂, b₁, b₂}, T′ is a total number of the clean frames u₁, u₂, . . . , u_(T′), and η is a constant used to control the tradeoff between reconstruction accuracy and regularization on connection weights (for example, η can be set as 0.0002). The training data including the clean frames u₁, u₂, . . . , u_(T′), and the training parameters of W_(1-test), W_(2-test), b_(1-test) and b_(2-test) can be substituted into the equation (1) and equation (3) to obtain a reference frame ū_(t). When the training parameters of W_(1-test), W_(2-test), b_(1-test), and b_(2-test) can make the reference frame ū_(t) be approximate to the clean frames u_(t), such training parameters of W_(1-test), W_(2-test), b_(1-test), and b_(2-test) are taken as the parameters of W₁, W₂, b₁ and b₂ of equation (1) and equation (3). When the noisy speech sample v approximates the electrical speech signal y, the training result of the parameters of W₁, W₂, b₁ and b₂ can be optimized. The optimization of equation (5) can be done by using any unconstrained optimization algorithm. For example, a Hessian-free algorithm can be applied in the present disclosure.

After training, optimized parameters including W₁, W₂, b₁ and b₂ are obtained, to be applied to equation (1) and equation (3) for real noise reduction application.

In frequency domain, the parameters including W₁′, W₂′, b₁′ and b₂′ of equation (2) and equation (4) are optimized based on the following objective function:

$\begin{matrix} {\theta^{*} = {\arg \; {\min_{\theta}\; \left( {{\frac{1}{T^{\prime}}{\sum\limits_{t = 1}^{T^{\prime}}{{u_{t} - {\overset{\_}{u}}_{t}}}_{2}^{2}}} + {\eta \left( {{W_{1}^{\prime}}_{2}^{2} + {W_{2}^{\prime}}_{2}^{2}} \right)}} \right)}}} & \left( {{equation}\mspace{14mu} (6)} \right) \end{matrix}$

In equation (6), θ is a parameter set {W₁′, W₂′, b₁′, b₂′}, T′ is a total number of the clean frames u₁, u₂, . . . , u_(T′), and η is a constant used to control the tradeoff between reconstruction accuracy and regularization on connection weights (for example, η can be set as 0.0002). The training data including the clean frames u₁, u₂, u_(T′) and the training parameters of W_(1-test)′, W_(2-test)′, b_(1-test)′ and b_(2-test)′ can be substituted into the equation (2) and equation (4) to obtain a reference frame ū_(t). When the training parameters of W_(1-test)′, W_(2-test)′, b_(1-test)′ and b_(2-test)′ can make the reference frame ū_(t) be approximate to the clean frames u_(t), such training parameters of W_(1-test)′, W_(2-test)′, b_(1-test)′ and b_(2-test)′ are taken as the parameters of W_(1-test)′, W_(2-test)′, b_(1-test)′ and b_(2-test)′ of equation (2) and equation (4). When the noisy speech sample v approximates the electrical speech signal y, the training result of the parameters of W_(1-test)′, W_(2-test)′, b_(1-test)′ and b_(2-test)′ can be optimized. The optimization of equation (6) can be done by using any unconstrained optimization algorithm. For example, a Hessian-free algorithm can be applied in the present disclosure.

After training, optimized parameters including W_(1-test)′, W_(2-test)′, b_(1-test)′ and b_(2-test)′ are obtained, to be applied to equation (2) and equation (4) for real noise reduction application.

With reference to FIGS. 4A and 4B, FIG. 4A an amplitude envelope of a clean speech signal is shown and FIG. 4B shows an amplitude envelope of a noisy speech signal. FIG. 4C shows an amplitude envelope detected by a conventional log-MMSE (minimum mean square error) estimator. FIG. 4D shows an amplitude envelope detected by a conventional KLT (Karhunen-Loeve transform) estimator. FIG. 4E shows an amplitude envelope detected by the present disclosure. Comparing FIG. 4E with FIG. 4A, the result of detection is most closely approximate to the clean speech signal, which means the noise is removed. Comparing FIG. 4B with FIGS. 4C and 4D, the results of detection as illustrated in FIGS. 4C and 4D are still noisy.

According to experiment result as mentioned above, the signal performances of the conventional log-MMSE estimator and the KLT estimator are not as good as those obtained by the procedures of the present disclosure. The procedures of the present disclosure have better noise reducing efficiency.

2. Signal Compression Stage

With reference to FIGS. 2 and 5, for the i-th channel of the speech processor 12, the signal compressor 123 receives an amplitude envelope of the t-th clean frame x_(t) within the specific frequency band from the noise reduction unit 126, through the band-pass filter 121 and the envelope detection unit 122. The amplitude envelope 30 of the t-th clean frame x_(t) is illustrated in FIG. 6. As shown in FIG. 6, the amplitude envelope 30 of t-th clean frame x_(t) is time-varying.

The signal compressor 123 of the present disclosure comprises a compression unit 127, a boundary calculation unit 128, and a compression-factor-providing unit 129. The compression unit 127 and the boundary calculation unit 128 are connected to the envelope detection unit 122 to receive the amplitude envelope 30 of the t-th clean frame x_(t) in real-time. With reference to FIGS. 5 and 6, the boundary calculation unit 128 can detect an upper boundary UB and a lower boundary LB in the amplitude envelope of the t-th clean frame x_(t). The results of calculations as to the upper boundary UB and the lower boundary LB are transmitted to the compression-factor-providing unit 129. The upper boundary UB and the lower boundary LB can be calculated by:

UB= x _(t)+α₀×(max(x _(t))− x _(t))   (equation (7))

LB= x _(t)+α₀×(min(x _(t))− x _(t))   (equation (8))

wherein α₀ is an initial value.

The compression unit 127 receives the amplitude envelope 30 of the t-th clean frame x_(t) and outputs a t-th output frame z_(t). Inputs of the compression-factor-providing unit 129 are connected to an input of the compression unit 127, an output of the compression unit 127, and an output of the boundary calculation unit 128. Results of calculating the upper boundary UB, the lower boundary LB, and the t-th output frame z_(t) are received from unit 128. An output of the compression-factor-providing unit 129 is connected to the input of the compression unit 127, such that the compression-factor-providing unit 129 provides a compression factor α_(t) to the compression unit 127. The compression factor α_(t) is determined according to a previous compression factor α_(t-1), the upper boundary UB, the lower boundary LB, and the t-th output frame z_(t). In brief, the procedures herein may determine the compression factor α_(t) for a frame based on the frame's amplitude upper boundary UB and lower boundary LB. When the t-th output frame z_(t) is in a monitoring range between the upper boundary UB and the lower boundary LB, the compression factor α_(t) can be expressed as:

α_(t)=α_(t-1)+Δα₁   (equation (9))

where Δα₁ is a positive value (i.e., Δα₁=1).

In contrast, when the t-th output frame z_(t) is beyond the monitoring range, the compression factor α_(t) can be expressed as:

α_(t)=α_(t-1)+Δα₂   (equation (10))

where Δα₂ is a negative value (i.e., Δα₂=−0.1).

The t-th output frame z_(t) can be expressed as:

z _(t)=α_(t)×(x _(t) −x _(t))+ x _(t)   (equation (11))

where x _(t) is a mean of the amplitude envelope of the t-th clean frame x_(t).

According to equations (9) and (10), a present compression factor α_(t) is obtained by a previous compression factor α_(t-1). It can be understood that the compression factor α_(t) for the next frame can be modified based on the next frame's amplitude upper boundary UB and lower boundary LB. According to equation (11), the t-th output frame z_(t) is repeatedly adjusted by the t-th clean frame x_(t) and the results of calculating UB, LB, and α_(t). According to experiment result, the signal compression capability is good. As illustrated in FIG. 7, speech components A in the t-th output frame z_(t) are amplified. The speech components A even reach the upper boundary UB. In contrast, noise components B are not significantly amplified. Therefore, the t-th output frame z_(t) is enhanced to stimulate the cochlear nerves and the patient can accurately hear a spoken conversation. 

What is claimed is:
 1. A signal processing method for a cochlear implant, the cochlear implant comprising a microphone and a speech processor, the signal processing method being executed by the speech processor comprising: receiving an electrical speech signal from the microphone; segmenting the electrical speech signal to a plurality of time-sequenced noisy frames; reducing noise in each of the plurality of time-sequenced signal frames to obtain a plurality of clean signal frames, the plurality of clean signal frames comprising a (t-1)-th clean frame x_(t-1) and a t-th clean frame x_(t); obtaining a (t-1)-th compression factor α_(t-1) according to the (t-1)-th clean frame x_(t-1); obtaining a t-th compression factor α_(t) for the t-th clean frame x_(t) according to the compression factor α_(t-1) and the t-th clean frame x_(t); obtaining a t-th output frame z_(t) based on the t-th compression factor α_(t); and outputting the t-th output frame z_(t).
 2. The signal processing method of claim 1, further comprising: obtaining a (t-1)-th amplitude envelope of the (t-1)-th clean frame x_(t-1) and calculating a (t-1)-th upper boundary and a (t-1)-th lower boundary of the (t-1)-th amplitude envelope; wherein the (t-1)-th compression factor α_(t-1) for the (t-1)-th clean frame x_(t-1) is obtained based on the (t-1)-th upper boundary and the (t-1)-th lower boundary.
 3. The signal processing method of claim 2, further comprising: obtaining a t-th amplitude envelope of the t-th clean frame x_(t) and calculating a t-th upper boundary and a t-th lower boundary of the t-th amplitude envelope; wherein the t-th compression factor α_(t) for the t-th clean frame x_(t) is obtained based on the compression factor α_(t-1), the t-th upper boundary and the t-th lower boundary.
 4. The signal processing method of claim 3, wherein when the t-th output frame z_(t) falls within a range between a the t-th upper boundary and the t-th lower boundary, the t-th compression factor α_(t) is calculated by: α_(t)=α_(t-1)+Δα₁, and Δα₁ is a positive value.
 5. The signal processing method of claim 3, wherein when the t-th output frame z_(t) falls beyond a range between a the t-th upper boundary and the t-th lower boundary, the t-th compression factor α_(t) is calculated by: α_(t)=α_(t-1)+Δα₂, and Δα₂ is a negative value.
 6. The signal processing method of claim 1, wherein the t-th output frame z_(t) is obtained by: z_(t)=α_(t)×(x_(t)−x _(t))+x _(t), and x _(t) is a mean of the t-th amplitude envelope of the t-th clean frame x_(t).
 7. The signal processing method of claim 1, wherein the t-th clean frame x_(t) is calculated by: x _(t) =W ₂ h(y _(t))+b ₂; wherein h(y_(t)) is a function including W₁ and b₁ in time domain, W₁ and W₂ are default connection weights in the time domain, and b₁ and b₂ are default vectors of biases of hidden layers of a deep denoising autoencoder based noise reduction (DDAE-based NR) structure in the time domain.
 8. The signal processing method of claim 7, wherein the h(y) is calculated by: ${h\left( y_{t} \right)} = {\frac{1}{1 + {\exp \left\lbrack {- \left( {{W_{1}y_{t}} + b_{1}} \right)} \right\rbrack}}.}$
 9. The signal processing method of claim 1, wherein the t-th clean frame x_(t) is calculated by: x _(t)=InvF{(W ₂ ′h′(F{y _(t)})+b ₂′)} wherein F{} is a Fourier transform function to transfer the t-th noisy frame y_(t) from time domain to frequency domain; h′( ) is a function including W₁′ and b₁′; W₁′ and W₂′ are default connection weights in frequency domain; b₁′ and b₂′ are default vectors of biases of hidden layers of a DDAE-based NR structure in the frequency domain; and InvF{ } is an inverse Fourier transform function.
 10. The signal processing method of claim 9, wherein the h′(F{y_(t)}) is calculated by: ${h^{\prime}\left( {F\left\{ y_{t} \right\}} \right)} = {\frac{1}{1 + {\exp \left\lbrack {- \left( {{W_{1}^{\prime}F\left\{ y_{t} \right\}} + b_{1}^{\prime}} \right)} \right\rbrack}}.}$
 11. The signal processing method of claim 1, wherein the t-th upper boundary (UB) is calculated by UB=x _(t)+α₀×(max(x_(t))−x _(t)), the t-th lower boundary (LB) is calculated by UB=x _(t)+α₀×(min(x_(t))−x _(t)), and x _(t) is a mean of the t-th amplitude envelope of the t-th clean frame x_(t). 